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Abstract 

Motivated by recent studies of population coding in theoretical neuroscience, we examine the 
optimality of a recently described form of stochastic resonance known as suprathreshold stochastic 
resonance, which occurs in populations of noisy threshold devices such as models of sensory neurons. 
Using the mutual information measure, it is shown numerically that for a random input signal, 
the optimal threshold distribution contains singularities. For large enough noise, this distribution 
consists of a single point and hence the optimal encoding is realized by the suprathreshold stochastic 
resonance effect. Furthermore, it is shown that a bifurcational pattern appears in the optimal 
threshold settings as the noise intensity increases. Fisher information is used to examine the 
behavior of the optimal threshold distribution as the population size approaches infinity. 

PACS numbers: 05.40. Ca, 02.40.Xx, 87.19.La, 89.70,+c 
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A fascinating aspect of the behavior of populations of neurons is their capability of 
reliability in the presence of very low signal to noise ratios It has been established 
by many studies that improved performance in individual neurons can be achieved in the 
presence of large ambient noise, by a mechanism known as Stochastic Resonance (SR) 3, 

0, Q]. Motivated by the established fact that a single stimulus can induce a response in 
many sensory neurons, we aim to link SR with a major branch of theoretical neuroscience, 
that of population coding, in which one of the aims is to understand how populations of 
neurons encode input stimuli jsj]. The approach we take is to analyze the optimal encoding 
of a random input signal by a population of simple threshold devices, or comparators, all 
of which are subject to additive iid input noise. Although this approach greatly simplifies 
the dynamics of realistic neural models, it does encapsulate the main nonlinearity: that of 
a threshold that generates an output spike when crossed. A natural measure to use, and 
one which recently has been used extensively in computational neuroscience Q], is that of 
mutual information. 

The vast majority of studies on SR in static nonlinearities and neurons have been re- 
stricted to the case of subthreshold signals since, for a single device and suprathreshold 
stimuli, noise enhanced signal transmission disappears. Hence, it has been pointed out that 
stochastic resonance is a sub-optimal means for improving systemperformance, since opti- 
mal performance can be gained by adjusting the threshold value |y|. However, we show 
here that this does not necessarily apply to systems consisting of more than one threshold 
device receiving the same input signal and subject to independent noise. Previously, it has 
been shown for suprathreshold signal levels in such a system that the mutual information 
between the input and output signals has a maximum value for nonzero noise intensity. This 
phenomenon was termed Suprathreshold Stochastic Resonance (SSR) to illustrate the fact 
that it is a form of stochastic resonance that is not restricted to subthreshold signals 
Subsequently, the effect was also shown to occur in FitzHugh-Nagumo model neurons 
and applied to cochlear implant encoding j^. 

Here for the first time, we discuss the optimality of SSR by examining whether the mutual 
information can be increased by adjusting the thresholds, which we denote as {6 n },n = 

1, ..,N, while keeping the noise constant. Using a Gaussian signal and iid Gaussian noise, 
we show numerically that above a certain noise intensity the optimal threshold settings 
occur when all thresholds are equal to the signal mean. Hence, we show the SSR effect 
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is a case of SR where making use of the ambient noise truly provides the optimal system 
response. Furthermore, we show that as noise increases from zero, (where the optimal 
thresholds are widely distributed across the dynamic range of the signal), the values of 
the optimal threshold settings go through a number of transitions where more and more 
threshold values accumulate to fewer and fewer points, in a series of bifurcations. Such 
an accumulation of optimal thresholds to a small number of point singularities appears to 
persist if we let N approach infinity and hence, in this case there must be a transition from 
continuous to singular solutions of the optimal threshold values. 

The model in which SSR occurs is shown in Fig. ^ It consists of iV threshold devices all 
receiving the same sample of a random input signal x, with pdf P(x) where the n— th device 
is subject to continuously valued iid additive noise, r\ n (n = 1, ..,N) with pdf R(rj), which 
is also independent of x. The output from each comparator, y n , is unity if the input signal 
plus the noise is greater than the threshold, 9 n , of that device and zero otherwise. The 
outputs from each comparator are summed to give the overall output signal, y. Hence, y is 
a discrete signal taking on integer values from to N and is a nondeterministic and lossy 
encoding of the input signal. If we impose the constraint that all thresholds are set to the 
same value, the SSR effect can occur, in which case the mutual information has a maximum 
for nonzero noise intensity |7[. The effect is maximized when the thresholds are all set to 
the signal mean [10]. Such behavior has also been shown with other measures 

Our objective here however, is to relax the constraint that all thresholds are identical, and 
to find the threshold values, {9^}, that maximize the mutual information. This is expressed 

as I(x, y) = - J2n=o p y( n ) lo S2 Py{n) - (- JZo P ( x ) En=o p (v = n \ x ) lo S2 P{y = n\x)dx^j , 
where P(x) is assumed known and we have P y (n) = J™ P(y = n\x)P(x)dx. Given this, 
the mutual information depends only on the conditional probability of the output given the 
input, P{y\x). Let P n (x) be the probability of device n being "on" (that is, signal plus noise 
exceeding the threshold 9 n ), for a given value of input signal x. Then 

R( V )dri = l-F R (9 n -x), (1) 

where Fr is the cumulative distribution function (cdf) of the noise and n = 1, .., N. For the 
particular case when the thresholds all have the same value, then each P n (x) has the same 
value for all n and we have P(y\x) given by the binomial distribution 7}. However, in general 
it is difficult to find analytical expressions for P(y\x) and we will rely on numerics. Given 
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any arbitrary N, R(v), and |6L j, \P rt (x)\ can be calculated exactly for any value of x from 
O. hon, wMch P Z) can be foi 2 S an efficient recursive foLa 3, and hence the 
mutual information calculated numerically. Our problem of finding the threshold settings 
that maximize the mutual information can now be expressed as a nonlinear optimization 
problem, where the cost function to maximize is the mutual information, and there are 
structural constraints on how P(y\x) is obtained, 

Find: max I(x, y) 
subject to: P(y\x) is a function of {P n (x)}, 

N 

\J P(y = n\x) = 1 Wx, 

n=0 

and {6 n } E R n (2) 

This formulation is similar to previous work on clustering and neural coding problems solved 
using a method known as deterministic annealing Q, Q] . In particular, the formulation 
reached in can be expressed in a fashion identical to (J2J with the structural constraints 
removed. Hence, the solution method used in that work to find the optimal conditional 
distribution, P(y\x), cannot be used here, and instead we concentrate on optimizing the 
only free variable, the set {6 n }. To this end, we have successfully applied a random search 
method based on simulated annealing to find near optimal solutions to (J2J) for any given 
N, P(x) and noise intensity. We have found this method to be highly efficient and effective 
and have tested it by using other solution methods that do not rely on any assumptions of 
local convexity in the solution space, including a genetic algorithm. These methods always 
provide very nearly the same solution as our method, but are far slower to converge. 

We present results for the case of iid zero mean Gaussian signal and noise distributions. 
If the noise has variance a 2 then we have P n (x) = § + ferf ( \ , where erf is the error 
function. Let a = <J v /a x where a 2 x is the variance of the Gaussian signal. Note that for the 



case of o T] = that it possible to analytically determine the optimal thresholds [11 1. and 
that in this case, each threshold has a unique value. Fig. [21 shows our results for the optimal 
threshold settings for N = 15 plotted against increasing a. Several interesting features are 
present in these results. 

Firstly, for increasing noise, the optimal thresholds cluster to particular values, which we 
will denote as an accumulation point. There are also bifurcations for increasing noise. At 
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each bifurcation, the number of accumulation points decreases, so that overall, the noise 
intensity is divided into m < N distinct regions, each with k m accumulation points. The 
fraction of thresholds at each accumulation point is shown in Fig. |3] for various values of a. 
For noise greater than the final bifurcation point, we have the SSR region occurring, that 
is, the optimal solution is for all thresholds to be equal to the signal mean of zero. It can 
be seen that sometimes a continuous bifurcation occurs and one accumulation point splits 
into two (as noise decreases), and on other occasions a discontinuous bifurcation occurs. 

We also note that there are regions where asymmetry occurs about the x-axis. Fur- 
thermore, our results have also shown the existence of at least two identical global optimal 
threshold settings for certain values of a. This can occur at a bifurcation point, where two 
different sizes of accumulation points can provide the same mutual information, or between 
bifurcations where the two solutions are sets of thresholds that are simply the negative of 
each other. The bifurcational structure is quite surprising, but appears to be fundamental to 
the problem type, as we have found similar bifurcation structures in the optimal threshold 
setting for measures other than mutual information, including correlation coefficient and 
mean square distortion (error variance), and other signal and noise densities. Finally, it is 
evident that above a certain value of a the SSR situation is optimal. That is, the optimal 
quantization for large noise is to set all thresholds to the signal mean. This result shows for 
the first time that stochastic resonance can be optimal in threshold systems. 

To describe our results mathematically, define N states with labels z = n/N. Let the 
set of N optimal thresholds, be ordered by increasing value to get the sequence (9*)l =0 . In 
the absence of noise, it is straightforward to show that each optimal threshold is given by 
6* z = F~ l (z) where F~ 1 (.) is the inverse cdf of the signal distribution. 

We introduce a concept used in the theoretical analysis of high resolution quantizers in 
information theory: that of a quantizer point density function, X(x), defined over the same 
variable as the source The point density function has the property that f X(x)dx = 1, 
and gives the density of thresholds across the signal dynamic range. For nonzero noise, we 
observe that for a given value of a, our empirically optimal (#*)* =0 can take on at most 
k(a) unique values. As o increases, k decreases at each bifurcation. Denote v(j, er) as the 
fraction of thresholds assigned to the j-th accumulation point at noise intensity a where 



as Qj, so that there are Nv(J, a) thresholds at x = <dj. Hence, we can write a point density 




Denote the value of 6* for accumulation point j 
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function as a function of cr as 



k{a) 

\(x,a) = ^2v(j,a)5(x - Qj). 



We also note that J = _ \(x, a)dx is the fraction of thresholds with values less than or 



equal to a. For the special case of a = we can write the analytically optimal point density 
function as 



We can make some analytical progress on the solution to © by allowing the population 
size N to approach infinity, a case which is biologically relevant [18| . Let N — > oo, so that 
z is continuous in the region z G [0,1]. We can then say that the sequence of optimal 
thresholds {6* z )\ =0 defines a strictly non- decreasing, function O(z) defined on the continuous 
interval z e [0, 1]. For the noiseless case, we have Q(z) = F" 1 (z), which is a continuously 
valued function on z e [0, 1]. For any continuously valued pdf, P(x), there is a one-to-one 
mapping from z to the support of P(x). Furthermore, 



that is, the point density function is the pdf of the signal. 

However, for nonzero noise, our numerical solutions of (J2J) indicate that even for very large 
N the accumulation points and bifurcational structure persists. Hence, if we assume that 
this is the case also for infinite N, there must be a transition at some o from a continuously 
valued to a discretely valued optimal <d(z). This is the reason that we claim the optimal 
threshold distribution contains point singularities for a > 0. 

Furthermore, our numerical results indicate that the location of the m-th bifurcation 
tends to converge to the same value of noise as N increases. Under the assumption that this 
holds for infinite N, we are able to make use of an approximation to the mutual information 
to find the location of the final bifurcation, that is the smallest noise intensity for which 
SSR is the optimal coding strategy. This approximation relies on an expression for a lower 
bound on the mutual information involving the Fisher information, J(x), and the entropy 
of an efficient estimator for x. Fisher information has previously been studied in the context 
of SSR lij and is a measure of how well the input signal, x, can be estimated from a set 
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of N observations. In the limit of large N, the entropy of an efficient estimator approaches 
the entropy of the input signal, and if the distribution of this estimator is Gaussian then 
the lower bound becomes asymptotically equal to the actual mutual information as 

I(x, y) = H(x) - 0.5 j P(x) log 2 j^dx. (3) 

The Fisher information for the system in Fig. ^ is the same regardless of whether the N 
devices are summed or not. Hence the Fisher information can be expressed as the sum of 
the N individual Fisher informations j^J as 

2 



m = J2 



l dP n {x) 
dx 



^ P n (x)(l - P n (x)) 

which for the SSR case is identical to the expression derived for the Fisher information 
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2l| . For a zero mean Gaussian input signal (jHJ) becomes 

I(x, y) = 0.5 jf P(x) log 2 (j^j dx, (4) 

and it is now possible to efficiently solve (J2J) for large N. However we note that the condition 
of the distribution of y given x to be approximately Gaussian is only true for noise in the 
vicinity, or larger, than the final bifurcation. In this region, our empirical results for (J2J) 
show that v(l, a) = v(2, a) = 0.5, and that X(x, a) = 0.55(x — t)+ 0.55(x + t), where t>0. 
Under the assumption that this holds for very large N, it is straightforward to numerically 
find the value of t that maximizes (j3J) for any given a. This maximization finds that the 
asymptotic location of the first bifurcation is at a ~ 0.91. 

To summarize, we have shown that the optimal encoding of a Gaussian input signal by 
an array of noisy threshold devices contains point singularities in its threshold distribution, 
the number of which decreases in a series of bifurcations as the noise intensity increases. We 
have also found that for large enough noise, the optimal encoding is for all thresholds to be 
equal to the signal mean. This shows that SSR is a form of threshold-based SR that can be 
optimal. Finally, a Fisher information approach has shown that for very large population 
sizes, and Gaussian signal and noise, the noise intensity at which SSR becomes optimal 
converges to a ~ 0.91. 
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FIG. 1: Array of N noisy threshold devices. Each device receives the same input signal sample, 
and is subject to independent additive noise. The output from each device is unity if the sum of 
the signal and noise at its input is greater than the corresponding threshold and zero otherwise. 
The overall output, y, is the sum of the individual outputs. 
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FIG. 2: Plot of optimal thresholds, {#*} against a for N = 15 and Gaussian signal and noise. 
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FIG. 3: Threshold point density function, A(x, a) obtained for N = 15 and various values of a. 
The y-axes give the fraction of thresholds at each accumulation point, v(j,a) and the x-axis gives 
the threshold values, x = @j. 
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